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ABSTRACT 



A recently developed cost-effectiveness model for 
on-line retrieval systems is discussed through use of an example 
utilizing performance results collected from several independent 
sources and cost data derived for a recently completed study for the 
American Psychological Association. One of the primary attributes of 
the model rests in its great flexibility in that various combination 
of alternative systems and subsystems are open to comparison. Some o 
the systems which have been addressed include batch processing, 
on-line abstract and the subsystems include various levels of recall, 
several types of screening, and different user-system interfaces. The 
example chosen for discussion in this paper presents a 
cost-effectiveness comparison of on-line index and on-line abstract 
systems for various levels of demand and recall. (Author) 
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COST-EFFECTIVENESS OF ON-LINE RETRIEVAL SYSTEM 



An on-line system is by nature an interactive system characterized 
in part by the following features: 

1. Speed of response 

2. Ability to respond to requests for system parameters (e. g. , 
number of documents with a given indexing) 

3. Ability of natural language processing 

. 4. Alternative of use of an intermediary. The iterative nature of an 
on-line system allow the direction of a particular search to 
change at any stage during the entire search process. 

Although there has been much discussion and indecision as to appropriate 
measures of system effectiveness the two which follow would appear adequate 
for most circumstances: 

1. Proportion of relevant documents retrieved 
.2. Total number of documents retrieved. 

It has been found that a stochastic model lends itself ideally to the 
evaluation of retrospective search systems. In the case of an on-line 
system the principal components of the model are: 

1. Intermediary relevance judgment, if an intermediary is used to 
conduct the search. 

2. Query/system relevance response, which is the system's response 
to a series of queries. 

3. Screened system relevance- response, which corresponds to an 
intermediary's judgment (if an intermediary is used) and to the 
use of a document representation such as an abstract. 

These components represent the various alternatives which may be combined 
to simulate any particular on-line retrieval system. The various sources of 
error may be expressed as conditional probabilities. The following notation 
will be used consistently: 
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V relevant with respect to the verbalized request 

V- nonrelevant with respect to the verbalized request 

relevant with respect to the coder’s (intermediary's) interpretation 
C- nonrelevant with respect to the coder's (intermediary's) 

interpretation 

R^ relevant with respect to the system's response 

R- nonrelevant with respect to the system's response 

relevant with respect to the screener's judgment 
S- nonrelevant with respect to the screener's judgment 



Conditional probabilities will be designated by the standard notation 
P(A/B) which is read "the probability of A, given B. " Thus PfC^/V-) 
means "the probability that a document is relevant to the coder's inter- 
pretation given that it is nonrelevant with respect to the verbalized request. 
. The conditional probabilities used in retrospective search models are: 



relevance with respect V 

to the verbalized request ^ 
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to coder's interpretation _ 
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relevance with respect to 
screener's interpetation 



relevance with respect 
to the verbalized request 
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The conditional probailities are determined through controlled observation 
although in practice one is always- working with relative frequencies. 

So contructed, the model has the following features- 

(1) It shows the following summary figures: 

(a) the probability that a relevant document will be retrieved 

(b) the probability that a nonrelevant document will be retrieved 

(2) It shows the activities that are the principal sources of error 
through the entries for the conditional probabilities. Ideally, 
of course, all entries would be zeroes and ones, with the ones 
in the lower left-hand and upper right-hand corners. The amount 
of departure from this ideal indicates the extent of departure from 
perfection. 

(3) The effect of error-prone components on the total output of the system 
can be obtained. For example, it is possible to show what effect 
coder interpretation errors have on system performance. 

(4) It shows how specified-improvement in any component will affect 
system output. 

The model constitutes a simple application of the rules of probability 
and can be described mathematically as a finite Markov chain. with absorbing 

states. 

(1) P( R / V ) = P( R / C )P(C /V ) + P(R /C-)P (C-/V ) 

rr rr rr rrrr 

(2) P(R /V~) = P(R /C )P(C /V-) +P(R /C-)P(C-/V-) 

rr rrrr rr rr 

(3) P(S , R /V ) = P(S /V )P( R /V ) 

r r r rrrr. 

(4) P(S , R /V-) = P(S / V-)P( R /V-) 

rrr rrrr 
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The notation P(S , R /V ) indicates the probability that the system ha- 

classified, the document as relevant and that the screener has also 

classified it as such, given that the document is, in fact, relevant 

with respect to the verbalized request. 

The theoretical recall ratio (proportion of relevant documents retriev 

is P(S , R /V ). If N is the number of documents in the file that 
r r r r 

is relevant to a verbalized request and N- the number nonrelevant, then 
the theoretical precision ratio (proportion of relevant documents retrieved 
nonrelevant documents retrieved) is: 



N . P(S , R /V ) 
r r r r 



<sr 



N . P(S , R TV ) -r N- . F(S , R /V-) 
r rrr r rrr 



The number of documents retrieved may be found by N 



P(S , R /V ) 
rrr 



- ' ■ N- . P(S , R /V-) 

r r; r r 

Thus far, the model results .in measures of effectiveness rather than 
efficiency, since no costs have been introduced. 

We at Westat, under a contract with the American Psychological Assc 
developed the following generalized cost model for retrospective' search 
systems. This model includes these subsystems: 

(1) Search mode (on-line in this case) 

(2) Screening processes . 

• (3) Input (full text versus index terms and number of items input) 

(4) User/system interface 

(5) Method of presentation to the user 

The total cost of any retrospective search system, and therefore any on- 
retrieval system, is composed of three types of cost: 

(1) Fixed costs associated with each subsystem 

(2) Variable costs dependent on the number of items input to the sys: 

(3) Variable costs dependent on the number of searches conducted 
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Simply stated, 

i ii 'in 

C = C + c x 1 + c x 2 

The. fixed costs associated with each subsystem are: 

Cj staff, space rental, computer rental, and fixed computer 

storage charges for the specific computerized search system 
C 2 rent, staff, and screening devices that may be used in 

screening the search output 

Cg input costs such as thesaurus development, staff, tape 

conversion, and update costs 

staff, rent and sundry items involved in the user/system interface 
C, charges for mailing the search output to users 

O 

The fixed cost element is then 
e ' = C l +C 2 +C 3 + C 4 +C 5 

The variable costs that are dependent on the file size or number of 
items input to the system are: 

Cg the cost per item of indexing, abstracting, keyboarding, and 

any other input processing 

a r.u 

C^Xg the file loading costs, which are dependent not only on 

the file size, but also on the number of terms per item of input. 
This cost component is then 

C " = C 6 +C 7 X 5 

Another type of variable cost is dependent on the number of searches conducted 
per year. This is the annual demand for the retrospective search system. 

This is the most complicated element of the model because it ‘is composed 

of three parts: 

(1) Fixed costs per search. These are C -- the set-up costs for 

o 

mailing search output to users -- and C_ — cost of the user/system 

y 

interface, i. e. , the intermediary. 



(2) Costs dependent on the number of items retrieved in a search. 
These are C 1Q -- the computer cost of- retrieving an item, 

C,. — the cost of printing an item, and C 19 — the cost of 

11 • 

screening each item retrieved. 

(3) Costs dependent on the number of items mailed per search. 

This is C -- the cost. of actually mailing the search output to 
1 0 

the user. 

This cost component can be expressed as: 

C "'- C 8 +C 9 +X 3 <C 10 + C 11 +C 12 )+X 4 C 13 
Combining the elements of the cost equation, we have: 

C “ C l +C 2 +C 3 +C 4 + C 5 +X l (C 6 iC 7 X 5 ) + 



X 2 t C 8 +C 9 +X 3 <C 1 0 + C U +C 12 )+X 4 C X3- 



where 



Xj = number of items input 

X 9 = number of searches conducted 

u 

X 0 = number of items retrieved per search 

X, = number of items mailed per search 

4 

X = number of terms in authority list 

5 

= fixed cost associated with computing 
Cg = fixed cost associated with screening 
Cg = fixed cost associated with input 

C. = fixed cost associated with user/system interface 

4 

C - fixed cost associated with mailing results 

5 

Cg « total input cost per item 

C ? = total file loading cost per item per term 

r = fixed cost of mailing pen search 

0 

C = fixed cost of user/system interface per search 

0 

C ^ = computer retrieval cost per item retrieved 
C ^ = computer printing cost per item retrieved 

C = screening cost per item retrieved 

12 

C 19 s mailing cost per item mailed 

1 <3 

C * total annual cost 



This general equation can be used to estimate costs of potential search 
systems as well as to compare the cost/effectiveness trade-off of 
system alternatives. 

Table 1 shows some various alternatives for on-line retrospective 
searching along with associated effectiveness probabilities and cost 
figures. 

Using the figures noted in Table 1 and applying the model as outlined 
results in summary figures such as those shown in Table 2. 

Once effectiveness figures and system costs have been determined and 
the summary figures in Table 2 calculated, cost/effectiveness decision- 
making may begin. The weight to be assigned to each factor, of course, 
will depend upon specific system and organizational parameters, goals, 
objectives and operational constraints. It is most important that these 
factors be clearly understood by the cost/effectiveness team before the 
decision-making process is undertaken. 





Table 1 - Alternative On-line Retrospective Searching Processes: Effectiveness Probabilities and Costs 
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Table 1 (Continued) - Alternative On-line Retrospective Searching Processes: Elective Probabilities and Costs 
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Table 2 Summary Retrieval and Cost Figures Associated with Combinations of 
Alternative Searching Processes 



w T 2 



c 

c 

3 



- CT 

r-r- Cfi 

a 2. 



* ? 
i 



c 

c 

3 



ct* Q 

IS 

c-h n 

o ^ 

w 



c - 

K — * 

n 

2 o 



p 

o 



3 

0 

X 

n 

n 

o 

c 

3 



O 

o 



0 

3 

1 



0 >‘ 

H»- 

3 

C- 

O 

X 



2 §? 

< a 



a 



p 



{ cn 3> 
S ^ £ 

j 2 ® 
s £1 3 

-* p 
! w cc 
! < 
Co <n 

HT 

3 



o 


o 


o 


• o 


iff 


• 


• 


« 


• 


o 


CO 


CO 


w 


CO 


p 


00 


CO 


Cl 


on 


h-» 

h-» 














4* 


4^ 


4^ ! 


i 0 ) o 


H* 


>-* 


-0 


-0 ! 


i rf * 


• 


• 


• 


• 


? *2 


| 05 

1 

t 


CT> 


CT 


on 


• Q 








I—* 

• 


! - 


CO 


h-i 


to ! 


j-* 2 
0 


1 <i 


>— i 


-0 


H* 1 


on 


00 


Cl 


00 


00 • 


l 


• 


• ' 


• 


-w- 




CO 


4^ 


CO 


S 

rt- 




















• 


I ^ 


H* 


H-i 


to 




H* 


00 




H* 


o o 


CO 


00 


Cl 


CO 


><! • 


• 


• 


• 


• 


4^ 


o 


Cl 


4^ 


CO 


^ Cft 
fD 










3 










«■+ 


^r 

Cl 

>U 

CT 

> 


rfr 

CO 

O* 

Cl 


-=4 

CT 

>-•> 

CO 

w 


CO 

CO 

h-l 

•» 


Total 

tern 


! oo 


to 


00 


-0 


o 

O “ 


o 


4^ 


00 


on 


Cl 


CO 


00 


CO 


CO ^ 
r-b Cfi 










< i__ 



: <# 

t h-* 

CO 

c: 

• 

* 4* 

1 -O 


$ 91.31 


$129.58 


$ 82.94 


System 

cost/ 

search 


1 *&* 


-64 




-£4 


>J 1-J o t s 


NO 


CO 


cc 


h-k 


0)0 o ^ 


• 


• 


• 


• 


r+£— Cfi 1 


. eo 


CO 




<3 


>1* rt ; 


CO 


o 


CO 


on 






= $202, 100 + 24, 000 ($1. 6825) + 4, 000 [$ . 20 $11. 25- + 79. 9 ($ . 3756 + $ . 04) + 56. 5 ($ . 002)] 

= $202, 100 + $42,062 + 4,000 [$ .20 + $11.25 + $33,206 + $ .113] 

= $202, 100 + $244, 162 + 4, 000 [$44. 769] 

= $202, 100 + $244, 162 + $179,077.76 
= $423, 000 
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